CSS v1.2.0: Phase A — parametric coverage additions by ichinaski · Pull Request #6938 · DataDog/system-tests

ichinaski · 2026-05-14T13:40:15Z

First slice of system-tests coverage for the gaps identified in the CSS v1.2.0 status report.

What's added

Four parametric tests in tests/parametric/test_library_tracestats.py:

Test	Spec	Asserts
`test_http_method_endpoint_TS011`	§5 ClientGroupedStats	`HTTPMethod` / `HTTPEndpoint` populated from `http.method` / `http.endpoint` / `http.route`
`test_payload_metadata_TS012`	§3 ClientStatsPayload	`Hostname`, `Env`, `Version`, `RuntimeID`, `Sequence` populated; `Service` present (payload-level or per-bucket)
`test_agent_populated_fields_empty_TS013`	§3 ClientStatsPayload	`ContainerID`, `Tags`, `ImageTag`, `AgentAggregation`, `ProcessTagsHash` absent (agent-populated)
`test_partial_version_excluded_TS014`	§7 Span Exclusions	Spans with `_dd.partial_version` set don't contribute to stats

A _find_raw_v06_stats helper reads the raw msgpack payload, since the decoded V06StatsAggr view is narrower than the spec.

Parametric harness fix (python)

utils/build/docker/python/parametric/apm_test_client/server.py — /trace/stats/flush was relying on SpanStatsProcessorV06, which dd-trace-py removed when CSS moved to libdatadog. The endpoint now falls back to writer.on_shutdown() + writer.recreate() when the legacy processor is absent. Old behavior preserved otherwise.

Per-SDK summary (review focus)

python — harness fix above; passes all four tests.
golang — TS012 marked missing_feature: payload-level RuntimeID not set (stats.go:96-103). Other three pass.
java — TS014 marked missing_feature: exclusion uses internal longRunningVersion (ConflatingMetricsAggregator.java:308), not the spec's _dd.partial_version.
dotnet — TS014 marked missing_feature: _dd.partial_version spans not excluded.
nodejs — five SDK gaps marked per-test (TS001/TS003/TS005/TS006/TS007/TS014); see manifests/nodejs.yml for details on each. dd-trace-js CSS is implemented but diverges from spec on Resource field, aggregation keys, error/success separation, P0 dropping, and partial_version filtering.
php — class-level missing_feature (>=1.19.0): dd-trace-php ships CSS but libdatadog's sidecar holds buckets <20s (buffer_len=2 × bucket_size=10s) and dd_trace_synchronous_flush passes force=false, so parametric tests never observe /v0.6/stats. No userland force-flush API exists.
rust / ruby / cpp — all four marked missing_feature: CSS not implemented in tracer.

Spec gaps surfaced (follow-ups, not addressed here)

dd-trace-go:

Payload-level RuntimeID never set (ddtrace/tracer/stats.go:96-103)
Payload-level Service never set; only per-bucket (stats.go:181)
HTTPEndpoint reads http.endpoint, inconsistent with OTel's http.route
/info version field not parsed in infoResponse
Configurable retry on stats send (stats.go:266) contradicts spec
gRPC status code extraction delegated to agent
api.errors metric not emitted from stats endpoint error path

dd-trace-js (Node.js): see per-test manifest reasons (Resource = span name, single bucket for errors/success, no P0 drop on sample_rate=0, no _dd.partial_version filter).

dd-trace-php: no userland force-flush API; bucket aging blocks short-lived tests.

dd-trace-java: TS014 uses internal exclusion field instead of spec's _dd.partial_version.

Phases B–E (follow-ups on this branch)

B — trace filters (filter_tags, filter_tags_regex, ignore_resources)
C — sampler interactions, extended obfuscation (Cassandra/Redis)
D — internal DogStatsD metrics
E — stretch goals (stochastic rounding, bucket end-time, header naming)

Full plan in css-spec-coverage-plan.md (out-of-repo).

Asserts that span.http.method and span.http.route metadata are populated as HTTPMethod and HTTPEndpoint in the /v0.6/stats payload, per CSS v1.2.0 spec §5. Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.

Asserts Hostname, Env, Version, Service, RuntimeID, and Sequence are populated in the /v0.6/stats payload per CSS v1.2.0 spec §3 (deployment-level identifiers). Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.

Asserts that ContainerID, Tags, ImageTag, AgentAggregation, and ProcessTagsHash are absent or empty in the tracer-sent /v0.6/stats payload, per CSS v1.2.0 spec §3 (these fields are agent-populated). Mark nodejs, php, ruby, cpp as missing_feature.

Asserts spans with the _dd.partial_version metric set are excluded from stats aggregation, per CSS v1.2.0 spec §7 (Span Exclusions). A control span without the metric must still produce stats. Mark nodejs, php, ruby, cpp as missing_feature.

github-actions · 2026-05-14T13:40:46Z

CODEOWNERS have been resolved as:

manifests/cpp.yml                                                       @DataDog/dd-trace-cpp
manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/php.yml                                                       @DataDog/apm-php @DataDog/asm-php
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby
manifests/rust.yml                                                      @DataDog/apm-rust
tests/parametric/test_library_tracestats.py                             @DataDog/system-tests-core @DataDog/apm-sdk-capabilities
utils/build/docker/nodejs/parametric/server.js                          @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/python/parametric/apm_test_client/server.py          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core

Test agent returns the /v0.6/stats request body as a base64-encoded str, not bytes. Mypy correctly flagged the annotation mismatch on TS011.

Split type-and-truthiness assertions into separate checks per ruff's pytest rule against compound assert statements.

datadog-prod-us1-6 · 2026-05-15T09:35:13Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 306dd8d | Docs | Datadog PR Page | Give us feedback!}

…ey fail CI revealed real per-SDK gaps: - python: parametric harness does not flush /v0.6/stats (TS011-TS014 all) - golang: HTTPEndpoint not populated from http.route; Hostname not set on payload (TS011, TS012) - dotnet: _dd.partial_version spans not excluded from stats (TS014) These reflect real implementation gaps in those SDKs (or the parametric harness), not test bugs — markers explain the gap per spec section.

Resolve manifest/dotnet.yml conflict on TS001 version (main bumped to <3.43.0). Replace ': ' with ' - ' in CSS v1.2.0 missing_feature reasons to avoid YAML parsing errors (colon was being interpreted as a mapping value).

Java fails TS011 (HTTPMethod=None), TS012 (Hostname=''), and TS014 (partial.snapshot not excluded) on both dev and prod. TS013 passes. Rust fails all 4 (parametric harness does not flush /v0.6/stats, same root cause as python).

TS011: was setting only http.route. dd-trace-go (and the spec field name) reads http.endpoint. Now sets both http.endpoint and http.route. Also adds DD_TRACE_RESOURCE_RENAMING_ENABLED=true so dd-trace-java's gate on HTTPMethod/HTTPEndpoint extraction (Config.java:2278) is on. TS012: tracers do not auto-detect hostname in the parametric harness; pin DD_HOSTNAME=test-host so the field is populated as the spec requires. Removed missing_feature markers from golang (TS011, TS012) and java (TS011, TS012) — those were test bugs, not implementation gaps. Java's TS014 marker remains: dd-trace-java's exclusion uses the internal longRunningVersion field, not the _dd.partial_version metric, which the spec mandates.

…ERVICE env Root cause of python failures: dd-trace-py >= 3.x delegates CSS to libdatadog's native TraceExporter. The exporter only flushes /v0.6/stats on its 10-second internal timer or on shutdown — the parametric server's /trace/stats/flush endpoint was still using the long-removed Python-side SpanStatsProcessorV06 and silently no-op'd. - Update the parametric server to fall back to writer.on_shutdown() + writer.recreate() when the legacy processor is absent. This deterministically flushes libdatadog stats at the end of each parametric test. - TS012 also needed DD_SERVICE (payload-level Service is the configured main service name per spec §3, not the per-span service). Added to library_env alongside DD_HOSTNAME. With these fixes, all four python CSS tests pass locally. Removing python missing_feature markers for TS011-TS014.

dd-trace-go option.go:297 and dd-trace-java Config.java:2005 only populate the Hostname field on the ClientStatsPayload when DD_TRACE_REPORT_HOSTNAME is on. Without it, both SDKs return empty Hostname even when DD_HOSTNAME is set.

dd-trace-go stats.go:181 only writes Service at the per-bucket StatSpanConfig level (the span's service), not at the payload level. The spec mandates Service in ClientStatsPayload. This is the same divergence already documented for Go on test_top_level_service in the e2e suite.

Two fixes informed by checking the trace-agent's actual use of these fields: 1. Service: the trace-agent uses payload-level ClientStatsPayload.Service only as a partition-key hint in PayloadAggregationKey.BaseService (client_stats_aggregator.go:178). The per-bucket ClientGroupedStats.Service is the spec-required source of truth that ends up at the backend. dd-trace-go intentionally writes Service only at the per-bucket level (stats.go:181). Accept either location. 2. Env: dd-trace-java's WellKnownTags does not apply the spec's 'unknown-env' default when DD_ENV is unset. Pin DD_ENV (plus DD_VERSION for completeness) so the assertion is deterministic across SDKs. Removed the Go TS012 marker — the test now passes for Go under the lenient Service assertion.

dd-trace-go stats.go:96-103 builds the PayloadAggregationKey without a RuntimeID field, so the /v0.6/stats payload sent to the agent has RuntimeID empty. The trace-agent passes RuntimeID through to the backend (writer/stats.go:408) for message-uniqueness/deduplication but doesn't aggregate by it. Functionally non-fatal, but it's a spec-mandated field.

Eldolfin · 2026-05-15T17:30:36Z

Relevant to phase B: #6952
And C: #6648

Both are still in draft because I'm waiting for at least one tracer to pass the tests (they do on local versions of dd-trace-py + libdatadog tho)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbdda2e69a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

bwoebi · 2026-05-19T10:43:18Z

@ichinaski I see your AI wrote CSS not implemented in tracer - but that's not true. PHP implements it.
Maybe something does not quite work as specified? But at least it's implemented.

I see there are multiple stats computation testsuites, PHP implements tests/stats/test_stats.py, I didn't realize there's also a parametric equivalent.

ichinaski · 2026-05-19T10:54:59Z

@ichinaski I see your AI wrote CSS not implemented in tracer - but that's not true. PHP implements it. Maybe something does not quite work as specified? But at least it's implemented.

I see there are multiple stats computation testsuites, PHP implements tests/stats/test_stats.py, I didn't realize there's also a parametric equivalent.

Sorry for the noise, I am still assessing what the Client Libraries MCP is reporting about the status of v1.2.0 implementation. I will put together all the resources and share them for review via Slack. I will certainly need some help understanding each SDKs status.

The PHP system-tests parametric markers said 'php has not implemented stats computation yet', but dd-trace-php shipped CSS in v1.19.0 via a hybrid C+Rust+sidecar architecture. The blocker for the Phase A parametric tests is that the PHP parametric app's /trace/stats/flush handler is a no-op (utils/build/docker/php/parametric/server.php:262), not a missing SDK feature. Update the markers to reflect this.

Both SDKs ship CSS but their parametric servers had no-op flush handlers, so TS011-TS014 had no way to drive a stats send before assertions. - nodejs: drive tracer._tracer._processor._stats.onInterval() and capture the span-stats writer's _sendPayload callback so the response waits for the HTTP PUT to /v0.6/stats to complete. - php: call dd_trace_synchronous_flush(1000); per ext/ddtrace.c:3286-3310 that invokes ddog_sidecar_flush with traces_and_stats=true, which drains both span and stats buckets through the sidecar transport in one call.

After the parametric /trace/stats/flush fixes for nodejs and php, the Tracestats markers attributed to 'harness lacks stats flush' are stale. - nodejs: drop 11 'nodejs has not implemented stats computation yet' markers. Keep TS014 with corrected reason citing the real SDK gap (span_stats.js onSpanFinished only filters TOP_LEVEL_KEY/MEASURED, not _dd.partial_version). Keep TS008 (broken everywhere). - php: collapse 11 'PHP parametric app lacks stats-flush API' markers into one class-level Test_Library_Tracestats: v1.19.0 gate, matching the existing tests/stats/test_stats.py: v1.19.0 pattern. Keep TS008. - rust: correct the misleading 'rust parametric harness does not flush /v0.6/stats' reason on TS011-TS014. dd-trace-rs only has the DD_TRACE_STATS_COMPUTATION_ENABLED config flag and no aggregation or exporter, so the harness no-op is downstream of an SDK gap.

PHP CSS stats need bucket aging >20s (libdatadog buffer_len*bucket_size) or DD_TRACE_FORCE_FLUSH_ON_SHUTDOWN+process exit. dd_trace_synchronous_flush passes force=false to the sidecar, so the parametric pattern cannot observe /v0.6/stats. Restored PHP /trace/stats/flush to NOP and updated the manifest reason to reflect the architectural limitation instead of the v1.19.0 marker. Node.js: added per-test missing_feature markers for TS001/TS003/TS005/TS006/TS007 describing the dd-trace-js SDK gaps surfaced by the now-running tests.

cbeauchesne

From framework usage, all good. Good you get a review from someone familair with the tested feature ?

ichinaski added 4 commits May 14, 2026 15:24

ichinaski added 3 commits May 15, 2026 11:23

Fix mypy type annotation in _find_raw_v06_stats helper

ad5faa4

Test agent returns the /v0.6/stats request body as a base64-encoded str, not bytes. Mypy correctly flagged the annotation mismatch on TS011.

Apply ruff format to TS014

99bc47f

Fix ruff PT018 by splitting composite asserts in TS012

b8495ee

Split type-and-truthiness assertions into separate checks per ruff's pytest rule against compound assert statements.

ichinaski added 10 commits May 15, 2026 11:39

Trigger CI: empty commit after manifest markers

f3666aa

Merge origin/main into inigo/css-spec-coverage

bbda151

Resolve manifest/dotnet.yml conflict on TS001 version (main bumped to <3.43.0). Replace ': ' with ' - ' in CSS v1.2.0 missing_feature reasons to avoid YAML parsing errors (colon was being interpreted as a mapping value).

ichinaski marked this pull request as ready for review May 19, 2026 09:43

ichinaski requested review from a team as code owners May 19, 2026 09:43

ichinaski requested review from Anilm3, daniel-romano-DD, manuel-alvarez-alvarez, mtoffl01, quinna-h and rachelyangdog and removed request for a team May 19, 2026 09:43

amarziali requested a review from sarahchen6 May 19, 2026 09:45

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

Comment thread tests/parametric/test_library_tracestats.py

ichinaski added 4 commits May 20, 2026 12:28

cbeauchesne reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSS v1.2.0: Phase A — parametric coverage additions#6938

CSS v1.2.0: Phase A — parametric coverage additions#6938
ichinaski wants to merge 21 commits into
mainfrom
inigo/css-spec-coverage

ichinaski commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-6 Bot commented May 15, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

Eldolfin commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

bwoebi commented May 19, 2026 •

edited

Loading

Uh oh!

ichinaski commented May 19, 2026

Uh oh!

cbeauchesne left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ichinaski commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's added

Parametric harness fix (python)

Per-SDK summary (review focus)

Spec gaps surfaced (follow-ups, not addressed here)

Phases B–E (follow-ups on this branch)

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-prod-us1-6 Bot commented May 15, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Eldolfin commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

bwoebi commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ichinaski commented May 19, 2026

Uh oh!

cbeauchesne left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ichinaski commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

datadog-prod-us1-6 Bot commented May 15, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

bwoebi commented May 19, 2026 •

edited

Loading